A Toxicogenomic Approach for the Prediction of Murine Hepatocarcinogenesis Using Ensemble Feature Selection
نویسندگان
چکیده
The current strategy for identifying the carcinogenicity of drugs involves the 2-year bioassay in male and female rats and mice. As this assay is cost-intensive and time-consuming there is a high interest in developing approaches for the screening and prioritization of drug candidates in preclinical safety evaluations. Predictive models based on toxicogenomics investigations after short-term exposure have shown their potential for assessing the carcinogenic risk. In this study, we investigated a novel method for the evaluation of toxicogenomics data based on ensemble feature selection in conjunction with bootstrapping for the purpose to derive reproducible and characteristic multi-gene signatures. This method was evaluated on a microarray dataset containing global gene expression data from liver samples of both male and female mice. The dataset was generated by the IMI MARCAR consortium and included gene expression profiles of genotoxic and nongenotoxic hepatocarcinogens obtained after treatment of CD-1 mice for 3 or 14 days. We developed predictive models based on gene expression data of both sexes and the models were employed for predicting the carcinogenic class of diverse compounds. Comparing the predictivity of our multi-gene signatures against signatures from literature, we demonstrated that by incorporating our gene sets as features slightly higher accuracy is on average achieved by a representative set of state-of-the art supervised learning methods. The constructed models were also used for the classification of Cyproterone acetate (CPA), Wy-14643 (WY) and Thioacetamid (TAA), whose primary mechanism of carcinogenicity is controversially discussed. Based on the extracted mouse liver gene expression patterns, CPA would be predicted as a nongenotoxic compound. In contrast, both WY and TAA would be classified as genotoxic mouse hepatocarcinogens.
منابع مشابه
Ensemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملDevelopment of an Ensemble Multi-stage Machine for Prediction of Breast Cancer Survivability
Prediction of cancer survivability using machine learning techniques has become a popular approach in recent years. In this regard, an important issue is that preparation of some features may need conducting difficult and costly experiments while these features have less significant impacts on the final decision and can be ignored from the feature set. Therefore, developing a machine for p...
متن کاملسودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه در پیشبینی بازده سهام
مقاله حاضر به بررسی سودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه (شامل روش مبتنی بر همبستگی و ریلیف) برای پیشبینی بازده سهام شرکتهای پذیرفته شده در بورس اوراق بهادار تهران میپردازد. بهمنظور ارزیابی عملکرد رگرسیون تجمیعی، معیارهای ارزیابی (شامل میانگین قدرمطلق درصد خطا، مجذور مربع میانگین خطا و ضریب تعیین) مربوط به پیشبینی این روش، با رگرسیون خطی و شبکههای عصبی مصنوعی...
متن کاملThe prediction of lymphedema via the combination of the selected data mining algorithms
Background: Breast cancer is the second leading cause of cancer death in women, after lung cancer. Due to the importance of predicting this disease, the use of data mining methods in medical research is more significant than before. Data mining algorithms can be a great help in preventing the development of lymphedema in patients. The aim Of this study was to create a diagnosis system that can ...
متن کاملFeature Selection for Small Sample Sets with High Dimensional Data Using Heuristic Hybrid Approach
Feature selection can significantly be decisive when analyzing high dimensional data, especially with a small number of samples. Feature extraction methods do not have decent performance in these conditions. With small sample sets and high dimensional data, exploring a large search space and learning from insufficient samples becomes extremely hard. As a result, neural networks and clustering a...
متن کامل